Improving the Performance of List Intersection
نویسندگان
چکیده
List intersection is a central operation, utilized excessively for query processing on text and databases. We present list intersection algorithms for an arbitrary number of sorted and unsorted lists tailored to the characteristics of modern hardware architectures. Two new list intersection algorithms are presented for sorted lists. The first algorithm, termed Dynamic Probes, dynamically decides the probing order on the lists exploiting information from previous probes at runtime. This information is utilized as a cache-resident microindex. The second algorithm, termed Quantile-based, deduces in advance a good probing order, thus avoiding the overhead of adaptivity and is based on detecting lists with non-uniform distribution of document identifiers. For unsorted lists, we present a novel hashbased algorithm that avoids the overhead of sorting. A detailed experimental evaluation is presented based on real and synthetic data using existing chip multiprocessor architectures with eight cores, validating the efficiency and efficacy of the proposed algorithms.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملA heuristic method for combined optimization of layout design and cluster configuration in continuous productions
Facility layout problems have been generally solved either hierarchically or integrated into other phases of plant design. In this paper, a hybrid method is introduced so that clustering and facilities layout can be simultaneously optimized. Each cluster is formed by a group of connected facilities and selection of the most appropriate cluster configuration is aimed. Since exact method by MIP i...
متن کاملIntegrating Skips and Bitvectors for List Intersection
This thesis examines space-time optimizations of in-memory search engines. Search engines can answer queries quickly, but this is accomplished using significant resources in the form of multiple machines running concurrently. Improving the performance of search engines means reducing the resource costs, such as hardware, energy, and cooling. These saved resources can then be used to improve the...
متن کاملOn the Security of O-PSI a Delegated Private Set Intersection on Outsourced Datasets (Extended Version)
In recent years, determining the common information privately and efficiently between two mutually mistrusting parties have become an important issue in social networks. Many Private set intersection (PSI) protocols have been introduced to address this issue. By applying these protocols, two parties can compute the intersection between their sets without disclosing any information about compone...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 2 شماره
صفحات -
تاریخ انتشار 2009